sega_genesis_graphic_compression_formatsfandomcom-20200213-history
Example of Nemesis Compression
Here is an example of how to compress graphics using Nemesis compression. Example Tiles Say, for example, you wanted to compress this 3-tile set. These are the numerals 1-3 in the classic NIntendo Entertainment System (NES) font. The tiles are magnified 4x so you see the detail. For this example, say that the blue pixels are color 0 and the yellow pixels are color 1. Analyzing Runs The first row of the first tile has pixel values 00011000, which is grouped as follows: 0x3, 1x2, 0x3. For the first tile, you get the following runs: 00011000 = 0x3, 1x2, 0x3 00111000 = 0x2, 1x3, 0x3 00011000 = 0x3, 1x2, 0x3 00011000 = 0x3, 1x2, 0x3 00011000 = 0x3, 1x2, 0x3 00011000 = 0x3, 1x2, 0x3 01111110 = 0x1, 1x6, 0x1 00000000 = 0x8 For the second tile: 01111100 = 0x1, 1x5, 0x2 11000110 = 1x2, 0x3, 1x2, 0x1 00001110 = 0x4, 1x3, 0x1 00111100 = 0x2, 1x4, 0x2 01111000 = 0x1, 1x4, 0x3 11100000 = 1x3, 0x5 11111110 = 1x7, 0x1 00000000 = 0x8 For the third tile: 01111110 = 0x1, 1x6, 0x1 00001100 = 0x4, 1x2, 0x2 00011000 = 0x3, 1x2, 0x3 00111100 = 0x2, 1x4, 0x2 00000110 = 0x5, 1x2, 0x1 11000110 = 1x2, 0x3, 1x2, 0x1 01111100 = 0x1, 1x5, 0x2 00000000 = 0x8 Combining Runs, row-to-row Sometimes, the last run of a row and the first run of the next row can be combined. This happens when the pixel values in these runs are the same. 'Tile 1' The first tile is currently expressed as: 0x3, 1x2, 0x3, 0x2, 1x3, 0x3, 0x3, 1x2, 0x3, 0x3, 1x2, 0x3, 0x3, 1x2, 0x3, 0x3, 1x2, 0x3, 0x1, 1x6, 0x1, 0x8 Similar consecutive runs could then be combined as follows: 0x3, 1x2, 0x5, 1x3, 0x6, 1x2, 0x6, 1x2, 0x6, 1x2, 0x6, 1x2, 0x4, 1x6, 0x8, 0x1 Note that runs cannot contain more than 8 pixels. 'Tile 2' 0x1, 1x5, 0x2, 1x2, 0x3, 1x2, 0x1, 0x4, 1x3, 0x1, 0x2, 1x4, 0x2, 0x1, 1x4, 0x3, 1x3, 0x5, 1x7, 0x1, 0x8 Becomes: 0x1, 1x5, 0x2, 1x2, 0x3, 1x2, 0x5, 1x3, 0x3, 1x4, 0x3, 1x4, 0x3, 1x3, 0x5, 1x7, 0x8, 0x1 'Tile 3' 0x1, 1x6, 0x1, 0x4, 1x2, 0x2, 0x3, 1x2, 0x3, 0x2, 1x4, 0x2, 0x5, 1x2, 0x1, 1x2, 0x3, 1x2, 0x1, 0x1, 1x5, 0x2, 0x8 Becomes: 0x1, 1x6, 0x5, 1x2, 0x5, 1x2, 0x5, 1x4, 0x7, 1x2, 0x1, 1x2, 0x3, 1x2, 0x2, 1x5, 0x8, 0x2 Combining Runs, tile-to-tile Here is the entire set of runs: 0x3, 1x2, 0x5, 1x3, 0x6, 1x2, 0x6, 1x2, 0x6, 1x2, 0x6, 1x2, 0x4, 1x6, 0x8, 0x1 0x1, 1x5, 0x2, 1x2, 0x3, 1x2, 0x5, 1x3, 0x3, 1x4, 0x3, 1x4, 0x3, 1x3, 0x5, 1x7, 0x8, 0x1 0x1, 1x6, 0x5, 1x2, 0x5, 1x2, 0x5, 1x4, 0x7, 1x2, 0x1, 1x2, 0x3, 1x2, 0x2, 1x5, 0x8, 0x2 It can be further improved like so: 0x3, 1x2, 0x5, 1x3, 0x6, 1x2, 0x6, 1x2, 0x6, 1x2, 0x6, 1x2, 0x4, 1x6, 0x8, 0x2 1x5, 0x2, 1x2, 0x3, 1x2, 0x5, 1x3, 0x3, 1x4, 0x3, 1x4, 0x3, 1x3, 0x5, 1x7, 0x8, 0x2 1x6, 0x5, 1x2, 0x5, 1x2, 0x5, 1x4, 0x7, 1x2, 0x1, 1x2, 0x3, 1x2, 0x2, 1x5, 0x8, 0x2 This makes our final data. Counting Runs Using the optimized data above, here is a count of our pixel runs. These runs are sorted in order from the least common to the most common: We can see that 1x2 is the most common run, with 12 occurrences, and there are four runs tied for one occurrence each. Building a Huffman Tree Next, the frequency counts above are transformed into a Huffman tree. Consider the tree in the form of a table. So far, it looks like this: To create a new node, take the two nodes with the least frequency, add their counts together, then make a new node with that count. In this example, the two least-frequency nodes are 1 and 2. Their counts are both 1, so the combined count is 2. Node 15 has a count of 2; its left child is node 1, and its right child is node 2. Then, remove nodes 1 and 2 from the list, and add node 15. Continue in this manner until only one node is left. That is the root. When finished, the table looks like this: Once the tree has been constructed, we can begin to figure out the codes for each of our runs. Finding the Codes The next step is to figure out what code corresponds to what run. #Look for the appropriate node in the left and right child columns. #If the node is in the left child column, prepend a 0 to the code. If the node is in the right child column, prepend a 1 to the code. #Figure out the parent node by looking in the node (leftmost) column of the same row in the table. #If the parent node is not the root (the very bottom row, which has the highest count), go back to step 1, looking for the node that was found in step 3. #At this point, the code is that of the run. Let's start with the most common run, 1x2. What is the code for it? *The 1x2 run is in node 14. *Node 14 is a right child, so the code so far is 1'''. *Node 14's parent node is node 25. *Node 25 is a left child, so the code so far is '''01. *Node 25's parent node is node 27. *Node 27 is the root, so stop here. The code for a 1x2 run is 01. When all the runs' codes are discovered, the codebook looks like this: This tree could be made canonical, which will be explained later. At this point, any code that is nine or more bits long, or starts with six 1-bits, is thrown out. The only run that meets either of these conditions is 1x7. This run will be encoded as an inline run. Now, it's time to construct our file. Storing Header Just take the number of tiles, and convert it to a two-byte value. There are 3 tiles in this set. 3 as a 16-bit hexadecimal number is just $0003, so the header bytes are 00 03. So far, our file looks like this: 00 03 Storing Huffman Tree In the binary tree, all runs of the same pixel value are grouped together. This is signalled by a byte equal to $80 plus the pixel value. There are two different pixel values in this file, $0 and $1. There are eight different runs of $0. Start with a byte of 80. For a run of 0x5, the higher nybble is the run length minus 1, so that is 4''', and the code length is 3 bits, so the lower nybble is '''3. The first byte is therefore 43. The second byte contains the code. The code for this run is 100. Pad this on the left with as many bits necessary to make a byte: 00000100. The second byte is 04. That means that this run is stored as 43 04. Here is a list of bytes for all the runs of pixel value $0: The only other pixel value left is $1. Start with a byte of 81. There are six runs of this pixel value, with one not stored: To mark the end of the tree definition, use a byte of FF. Our file now looks like: 00 03 80 43 04 23 01 13 00 54 0D 74 0A 66 3E 36 3D 06 3C 81 12 01 34 0C 24 0B 55 1D 45 1C FF Storing Tile Data Recall our tile data: 0x3, 1x2, 0x5, 1x3, 0x6, 1x2, 0x6, 1x2, 0x6, 1x2, 0x6, 1x2, 0x4, 1x6, 0x8, 0x2 1x5, 0x2, 1x2, 0x3, 1x2, 0x5, 1x3, 0x3, 1x4, 0x3, 1x4, 0x3, 1x3, 0x5, 1x7, 0x8, 0x2 1x6, 0x5, 1x2, 0x5, 1x2, 0x5, 1x4, 0x7, 1x2, 0x1, 1x2, 0x3, 1x2, 0x2, 1x5, 0x8, 0x2 Each of these runs will be turned into the corresponding code, then bitpacked. Let's start our bitpacking. In this particular example, the first three runs comprise the first byte of the data: 0x3 = 001 1x2 = 01 0x5 = 100 The bits are 00101100, and the byte is 2C. This is what the data looks like after all runs have been converted to codes: 001, 01, 100, 1011, 1101, 01, 1101, 01, 1101, 01, 1101, 01, 111101, 11101, 1010, 000 11100, 000, 01, 001, 01, 100, 1011, 001, 1100, 001, 1100, 001, 1011, 100, 111111 110 0001, 1010, 000 11101, 100, 01, 100, 01, 100, 1100, 111110, 01, 111100, 01, 001, 01, 000, 11100, 1010, 000 The 1x7 run is encoded as 111111 110 0001. The 111111 is the code for an inline run, 110 is the binary representation of the count minus 1 (7 pixels in this case), and 0001 is binary for the pixel value ($1). That's 178 bits altogether, but it needs to be padded to a multiple of 8 bits, so add 6 0-bits on the end, and when converting to bytes, this is what you get: 2C BD 75 D7 5F 7B 43 81 2C B3 87 0D CF F0 D0 EC 63 33 E7 C4 A3 94 00 That's a total of 23 bytes right there. Putting It Altogether The three sections are listed in order, and this is the final result of our file: 00 03 80 43 04 23 01 13 00 54 0D 74 0A 66 3E 36 3D 06 3C 81 12 01 34 0C 24 0B 55 1D 45 1C FF 2C BD 75 D7 5F 7B 43 81 2C B3 87 0D CF F0 D0 EC 63 33 E7 C4 A3 94 00 That's a grand total of 54 bytes, from the original 96 bytes, so the compressed file is 56.25% as small as the uncompressed one. All in all, that seems like a lot of work, but this format can really save on ROM space. There is an alternate way to encode the tile data before it gets processed into a Nemesis-compressed file. The compressor could either let you choose either mode you want (this could be useful for some purposes), or it could choose whichever encoding mode produces a smaller file size (this is the more desirable option). Exclusive-Or Mode Take the first line of the first tile, which was 00011000. As before, each line of the tile can be thought of as a 8-nybble hexadecimal number. The second line is 00111000. What do you need to exclusive-or (XOR) onto 00011000 to get 00111000? The answer is 00100000. That is, the third pixel from the left is the only one different between these two tile rows. For each row, take the XOR of that row and the previous row, and then the procedure is the exact same as before. Here's what each tile becomes as a result. Tile 1: 00011000 00100000 00100000 00000000 00000000 00000000 01100110 01111110 Tile 2: 01111100 10111010 11001000 00110010 10011000 10011000 00011110 11111110 Tile 3: 01111110 01110010 00010100 00100100 00111010 11000000 10111010 01111100 Keep in mind that the XOR-ing continues from one tile to the next. Now, break the tiles into runs. You get the following: 0x3, 1x2, 0x5, 1x1, 0x5, 0x2, 1x1, 0x8, 0x8, 0x8, 0x5, 0x1, 1x2, 0x2, 1x2, 0x1, 0x1, 1x6, 0x2 1x5, 0x2, 1x1, 0x1, 1x3, 0x1, 1x1, 0x1, 1x2, 0x2, 1x1, 0x5, 1x2, 0x2, 1x1, 0x1, 1x1, 0x2, 1x2, 0x3, 1x1, 0x2, 1x2, 0x6, 1x4, 0x1, 1x7, 0x2 1x6, 0x2, 1x3, 0x2, 1x1, 0x4, 1x1, 0x1, 1x1, 0x4, 1x1, 0x2, 1x1, 0x4, 1x3, 0x1, 1x1, 0x1, 1x2, 0x6, 1x1, 0x1, 1x3, 0x1, 1x1, 0x2, 1x5, 0x2 Count runs: Convert run counts to codes: Figure out your header. There are 3 tiles, but the high bit of the first byte must be set to indicate XOR mode, so instead of 00 03, it's 80 03. Now, for the tree. The runs are stored as follows: The 13 07 is preceded by an 80, the 02 01 is preceded by an 81, and after the 36 2C is an '''FF '''byte. Convert runs to codes: 10010, 001, 0001, 01, 0001, 111, 01, 0000, 0000, 0000, 0001, 110, 001, 111, 001, 110, 110, 10101, 111 10100, 111, 01, 110, 1000, 110, 01, 110, 001, 111, 01, 0001, 001, 111, 01, 110, 01, 111, 001, 10010, 01, 111, 001, 10011, 101100, 110, 101101, 111 10101, 111, 1000, 111, 01, 10111, 01, 110, 01, 10111, 01, 111, 01, 10111, 1000, 110, 01, 110, 001, 10011, 01, 110, 1000, 110, 01, 111, 10100, 111 There are 248 bits altogether, which is a multiple of 8, so no padding is needed. Convert to bytes: 91 14 7A 00 03 8F 3B 57 D3 BA 33 8F 44 F7 3C C9 E6 76 6B 7D 7C 76 EE 6E F6 F1 9C 66 E8 CF A7 And the final file: 80 03 80 13 07 03 06 44 01 74 00 35 17 55 13 25 12 81 02 01 13 01 24 08 55 15 45 14 66 2D 36 2C FF 91 14 7A 00 03 8F 3B 57 D3 BA 33 8F 44 F7 3C C9 E6 76 6B 7D 7C 76 EE 6E F6 F1 9C 66 E8 CF A7 That's a grand total of 64 bytes. Therefore, normal mode would be preferred if you want the smaller file size. Making a Huffman Tree Canonical To make a Huffman tree canonical, do the following procedure: #Sort the symbols according to their code lengths, in order from shortest to longest. #For groups of symbols with the same code length, sort those symbols in order from first to last. #Replace all the bits of the first symbol's code with 0s, preserving length. #Think of the code as a binary number, then: ##If the symbol's code length is the ''same ''as that of the previous symbol, take the previous symbol's code and add 1 to it. ##If the symbol's code length is ''longer ''than that of the previous symbol, take the previous symbol's code, add 1 to it, then append as many 0-bits as needed to preserve the length. #If there are symbols left, repeat step 4 with the next symbol. Modifiying the first Huffman tree made earlier in accordance with the rules above, the result is: Notes: *The symbols involved in this tree are the runs. *Runs of the same code-length are sorted by pixel value first, then count. *The shortest code was 2 bits long, so it becomes 00. *The first instance of moving to a longer code length occurs in the second row of the table. 00 in binary is 0 in decimal. Add 1, so the value becomes 1. Converting back to binary gives 01. The code should be 3 bits long, so append one 0-bit, making it 010. *The first instance of moving to a same-length code is in the third row of the table. The previous code is 010. That's 2 in decimal, and add 1 to make it 3. Convert back to binary, and that's 011. This tree looks neater than the one made previously. Canonical trees can be represented very compactly with clever programming.